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Abstract 

This paper outlines a new procedure to perform nonparametric esti- 
mation in hidden Markov models. It is assumed that a Markov chain 
{Xk}k>o is observed only through a process {Yk}k>o, where Yk is a noisy 
observation of /* (Xk ) . We propose a maximum likelihood based procedure 
to estimate the function /* using a block of observations lo:2n-i- This 
paper shows the identifiability of the model under several assumptions on 
the Markov chain and on the function /*. We also provide a proof of the 
consistency of the estimator of /* as the number of observations grows 
to infinity. This consistency result relies on the Hellinger consistency of 
an estimator of the likelihood of the observations. Finally, we provide 
numerical experiments to highlight the performance of the estimator. 

1 Introduction 

A bivariate stochastic process {(Xk, Y k )}k>o is said to be a hidden Markov 
model (HMM) if the state sequence {Xk}k>o is a Markov chain, if the observa- 
tions {Yfc}/c>o are independent conditionally on {Xk}k>Q and if the conditional 
distribution of Yk given the state sequence depends only on Xk . These models 
can be applied in a large variety of disciplines such as financial econometrics 
([19]), biology ([6]) or speech recognition ([15]). 

In this paper, the state-space of the Markov chain {Xk}k>o is assumed 
to be a compact subset of M. m homeomorphic to a convex subset of M m with 
a Lipschitz boundary. This Markov chain is a random walk with increment 
distribution known up to a scaling factor a+. The observations are given, for 
any k > 0, by Yj- = f+(Xk) + ek, where /* is a function on K taking values in 
R e and the measurement noise {ek}k>o is an i.i.d. sequence of i.i.d. Gaussian 
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on M. 1 with known covariance matrix. The aim of this paper is to estimate the 
function /* and the parameter a* using only the observations {Yk}k>a- 

In regression models such as errors-in-variables models, the variables {Xk}k>o 

are observed through a sequence {Zk}k>o given by = Xk + rjk, where the 
random variables {rjk}k>o are i.i.d. with known distribution. Many solutions 
have been proposed to solve this regression problem using an estimation of the 
probability density of Xq (this is the deconvolution problem) , see [4] , [5] and [14] 
for an estimation based on kernel density estimators; see also [7] for an estima- 
tion based on the minimization of a penalized contrast. Nevertheless, all these 
works rely on the assumption that the process {Xk}k>o is directly observed, 
which is not the case in our model. 

When {Xk}k>o is a Markov chain, [17] proposed an estimation of the density 
of the invariant probability and of the Markov kernel of {Xk}k>o when the 
chain is observed. The estimation procedure amounts to minimizing a penalized 
contrast in order to minimize the empirical L2-norm of the error. [16] provided 
an extension of this work in the HMM framework when the observations are 
given by 

Y k = X k + e k , 

where the random variables {ek}k>o are i.i.d. with known distribution. These 
works provide estimation procedures of the Markov chain {Xk}k>o but there 
does not exist any result on the nonparametric estimation problem studied in 
this paper. 

This problem is motivated by an application to localization using radio mea- 
surements (see [H]). In this case, at each time step k, a mobile device observes 
the power of signals transmitted by £ antennas; this measurement is denoted 
by Yfc. The localization of the device is denoted by Xk and is assumed to be a 
Markov chain on a subset of K 2 . The problem consists in estimating the local- 
izations {Xk}k>o only observing the signal powers {Yk}k>o- in this application, 
/+ represents the average propagation model, which means that the variable Yk 
follows the normal distribution on M. 1 , J\f(f+(Xk),cr 2 Ie)- An accurate estimation 
of the positions {Xk}k>o, using particle filtering for instance, relies on a good 
estimation of /* . 

The main result of this paper is the identifiability of the model. We assume 
that the Markov chain {Xk}k>o is stationary with known (up to a scaling fac- 
tor a*) transition kernel, and that f+ is a diffeomorphism on its image (which 
necessarily implies that m < £) . We assume in addition that /* is smooth in the 
sense that it belongs to some Sobolev space W s ' p (see (J5J). Provided that / is 
continuously differentiable and is such that (f(Xo), f(X\)) and (/*(Xo), /*(Xi)) 
have the same distribution we show that there exists an isometric transforma- 
tion (f> on the state-space K such that / = /* o <f>. A key step is to show that 
(/*) 1 o / is necessarily bijective, which is done using algebraic topology and 
measure theoretic arguments. 

Our estimator f n is defined as a maximizer of a penalized pairwise likelihood 
on the Sobolev space W s,p . The parameters s and p of the Sobolev space are 
assumed to satisfy s > m/p + 1 and K is assumed to be compact to allow the 
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use of classical Sobolev embeddings into the space of continuously differentiate 
functions on K. This estimator of /* is associated to an estimator p n of the 
marginal distribution of a pair of observations (see (12)). We prove that the 
Hellinger distance between p n and the true distribution of a pair of observations 
under (/*,a*) vanishes as the number of observations grows to infinity. More 
precisely, we prove that the rate of convergence of p n , in Hellinger distance, can 
be chosen as close as possible to n -1 / 2 . The consistency of (f n ,a n ) follows as 
a consequence together with the identifiability result and continuity properties. 
To analyze the asymptotic properties of our estimators, we need, as it is now well 
understood, deviation inequalities for the empirical process of the observations. 
To that purpose, we use the concentration inequality for additive functionals 
of Markov chains proved in pQ and the maximal inequality for dependent pro- 
cesses of |10j to have a control on the supremum of a function-indexed empirical 
process. 

Our results are supported by numerical experiments: in the case where 
the scaling parameter a* is known and m = 1, we provide an Expectation- 
Maximization based algorithm to compute /„, see We show that the esti- 
mation procedure can be solved using a differential equation. We provide several 
simulations that show the efficiency of our method. 

In Section [2] the model, the estimators and the assumptions are presented. 
The main results are displayed in Section [3} the identifiability of the model in 
Section [3.1| and the consistency of the estimator along with a rate of convergence 
in Section |3.2| The algorithm and numerical experiments are displayed in Sec- 
tion [4] Section [5] gathers important proofs on the identifiability and consistency 
needed to state the main results. Additional technical results are provided in 
the appendices and in the supplement paper |12) . 



2 Model and definitions 

Let I and m be positive integers and K be a subset of M. m . The main statistical 
problem considered in this paper is the estimation of an unknown target function 
/+ : K — > M. e when observing a process {Yfc} fcgN such that for any k > Q, Y k 
belongs to M. 1 and satisfies 

Y k = U(X k ) + e k . 

{efe} fegN is assumed to be an i.i.d Gaussian process with common known distri- 
bution jV(Q, <J 2 Ii)i Ig being the identity matrix of size £ and a 2 a fixed positive 
parameter. Denote by <p the probability distribution of eo, i.e. 

Vz e R e , V (z) ^ (2^)- /2 exp{-«} , 

where || • || is the euclidean norm on K m (we use the same notation for the 
euclidian norm on R e ). {X k } keN is assumed to be a non observed Markov 
chain, taking its values in K and independent of {efc} fcgN . In the sequel, all the 
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density functions are with respect to the Lebesgue measure on K, denoted by 
/i. For any a G R+, denote by q a the transition density on K defined, for all 
x, x' G K, by 

q a ( x , x ')^C a (x)q( ] ^^-) , (1) 



where q is a known, positive, continuous and strictly monotone function on K + 
and where 

aW *(£,(lf^I) (2, 
where da;' is a shorthand notation for /i(dx'). In our numerical application in 



Section 4.2 the Gaussian kernel q(x) = exp(—x 2 /2) is chosen. The Markov 
transition kernel associated with q a is denoted by Q a . Assume the existence of 
an unknown parameter a* > such that 

HI {Xk}kez is a stationary Markov chain with transition kernel Q a „- 

It follows from Ejljthat {Yk} keN is stationary. Assume the following statement 
on the set K: 

H2 (i) K is a compact subset of M. m . 

(ii) K is homeomorphic to a convex subset of M. m . 

(iii) K has a local Lipschitz boundary. 

K has a local Lipschitz boundary if, for any x in the boundary dK of K, there 
exists a neighbourhood V of x in <9-ftf which is the graph of a Lipschitz function. 
As an immediate consequence of the compactness of K and of the positivity of 
q, there exists < er_(a) < <J+{a) < +oo such that, for all x,x' G K 1 

<j-{a) < q a (x,x') < <r+(a) . (3) 

For any a > 0, Q a is a ^-irreducible and recurrent Markov kernel and then, it 
has a unique invariant probability distribution (see [21] Theorem 10.0.1]). By 

the symmetry of the kernel (x, x 1 ) — > q (^—^—^j , the finite measure on K with 

density function x t— > C~ (x) is Q Q -invariant. Therefore, the unique invariant 
probability of Q a has a density given by 

VxeK,„ a (x)^ ^"^^ ■ (4) 

Let p > 1, define 

LP d - f |/ : A > : ||/||£ p = ^ \\f(x)\\*dx < oo 

For any m-tuple a *== {ctj}^]^ of non-negative integers, we write |a| d = Y^i=i a *- 
For any / : K — > M. and any j € {1, • • • , £}, the j th component of / is denoted 
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by fj. Let s G N, define W s,p be the Sobolev space on K with parameters s 
and p, i.e., 

W s,p def ^ g L p. Daf eL P ae N m and | a | < s | ^ (g) 

where D a f : if — > represents here the vector of partial derivatives of order 
a, in the sense of distributions, of the components fj, for j e {1, • ■ • ,£}. Vy s,p 
is equipped with the norm || • ||v^ s -p defined, for any / £ W s,p , by 

i/p 

i/ik-* = f E p q /iil) ■ (6) 



For any subset f2 of M m , and any k > 0, let C fc (f2o) be the vector space 
of all the functions / : flo — > K such that there exists an open neighbourhood 
Q of fio (if is open we can take = f^o) in M. m and a function / : O — > K 
such that the restriction /|q of / on fig satisfies /|n = / and / is C fc -regular 
on f2, which means that / and all its partial derivatives D a f are continuous on 
J7. Define, for any x in f2 , D a f(x) — D a f(x). Let || • ||c*m Q ) t> e the norm on 

C k (n ) defined by ||/|| c * (no ) = sup w < fc ||£ a /||oo. We also define C k (n ,R e ) 

b y c k (n ,m. e ) = c k (n y. 

Remark 2.1. i) By I^m} and the Stein Theorem [1 Theorem 5.24], there 

O 

exists a positive constant C such that any bounded function / in C (K) 
can be extended by a function / in C 1 (W m ) 1 with ||/||c 1 (R m ) < C\\f\\ o . 

C (K) 

ii) Note that, for any j £ {1,- • • ,1} and / £ W s ' p , fj belongs to W S ' P (K,R), 
the Sobolev space of real-valued functions with parameters s and p. Let 
k > 0, by 2J Theorem 6.3], assuming that K satisfies t(2j[i]) and lr(2"||iii|) 
and s > m/p + k, W S ' P (K , R) is compactly embedded into the subspace of 

bounded functions in I C k (K ), || • || o I . Provided that s > m/p+1, and 



"C k (K) j 

arguing component by component, W s ' p is compactly embedded into the 

o 

subspace of bounded functions C l {K,M. 1 ). Moreover, the identity function 

o 

id : W s ' p — > C l {K,~M. 1 ) being linear and continuous, there exists a positive 
coefficient k such that, for any / £ W s ' p , 

ll/H o < K \\f\\ W s, P , (7) 

thus / is a bounded function in C 1 (if,]R f ) and, by [i| , can be extended by 
a function in C 1 (i4T, M. e ) shortly denoted by C 1 , and 

ll/lb < K\\f\\ W *,» ■ (8) 

H3 s > m/p + 1. 
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For any / £ C 1 and any x E K, the Jacobian of / at x, is defined by 
J 2 f (x) d = f Det[D f (x) T D f (x)] , 

where Dt(x) is the £xm gradient matrix of / at x defined, for any j G {1, . . . ,£} 
and any i G {1, . . . , m}, by 

„ , > dcf dfj , . 

JW*)i,< = ^(*) ■ 

For any sets E and F, f : E F, denote by Im(/) the image in F of /, 
Im(/) ^ f(E). 

H4 (i) G W s >p. 

(ii) f+ : K ^ Im(/*) is a diffeomorphism. 

Remark 2.2. i) We say that a function / : K — > Im(/) is a diffeomorphism if 
there exists an open neighbourhood V of if in K m and a diffeomorphism 
/: V^Im(7) such that f\ v = /. 

ii) By t|4j|ii]), for any a; in i^, the linear application Df t (x) is injective and 
thus, m < £ . 



We now give the definition of the estimators yf n ,a n J of (/*,<z*) given 2n 

observations {lfe}t!lo • For practical reasons (see proof of Proposition 3.6 1, we 
assume that a* G [a_,+oo[, for a known a_ > 0. For all integer n > 1, define 

(/„,a«) by 

(LS„) = argmax J - V lnp/, a (Y 2i: , y 2 fc+i) - A^/ 2 (/) L , (9) 
V ' few>.*>,a>a- { n t^o J 

where, for all yo, y\ in M. e , 

PfAvo,yi) J Vivo ~ f(xo))f(yi - f{x 1 ))v a (xo)qa(xo,xi)dx dx 1 (10) 
and, for some positive v, 

I 2 (f) = ll/lfc- (11) 
Remark 2.3. By the dominated convergence theorem, the function 



^ n— 1 

(/, a) - X! ^P/.oC^fe) ^2fc+i) 



k=0 



is continuous on C 1 x [a_,oo[, thus, by ([9]), (111 and Remark 2.1 f n exists and 
belongs to C 1 . 
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Consider the following assumption on v. 
H5 v > 21. 

Note that (/, a) — > ~ Sfe=o m P/,a(i2fe, i^fc+i) does not represent the likelihood 
of the observations {Y^^a but what we call the pairwise pseudo-likelihood of 
the observations. 

By ([9]), a n could be equal to 00 so that we shall extend our definitions to 
this case. By the dominated convergence theorem, for any Xo,x± £ K, any 
yo,2/i € K £ and any measurable function /, q a (x ,xi), v a (xo) and p/,a(2/o,yi) 
converge as a -> 00 to qoo{x , x{), v^Xq) and p/,oo(yo, V\), defined by: 



dcf 



loo(x Q ,Xi) d = fl(K) 1 , 

f(vo - f{x ))dxo / tp(yi - f(xi))dx 



Pf,oo(yo,yi) d = ^{Ky 2 



Let p n denote the maximum penalized likelihood estimator (MLE) of the 
density on R 2e of (Y , Yi), defined by 



^ def 
Pn = P 



(12) 



The convergence properties of this estimator will be analyzed with the Hellinger 
metric, defined, for any probability densities pi and pi on R 2 ^, by 



h(pi,pz) 



dcf 



(p\ /2 (y) -P2 /2 (y)) d v 



1/2 



(13) 



Remark 2.4. The reason we use the Sobolev framework instead of directly con- 
sidering the space C 1 is, first of all, computational. Indeed, as we will see in 
Section |4j the Sobolev norm chosen in penalty (111 can be easily manipulated 
compared with the C 1 norm. Mor eover , Theorem |3.5| ensures that ||/„||w s -p 
stays bounded and thus, by Remark 2.1 that {/„} n >i lies in a compact subset 



of C 1 . This plays a key role in the proof of Theorem |3.7[ 

Section[3]provides the main results of the paper. Theorem |3. l| establishes the 
identifiability of our model. Then, the Hellinger consistency of the MLE (12 1 



is shown in Theorem 3.5 This result does not imply, a priori, t he c onsistency 
of the estimators ( / n /Snj defined by (9 1. However, by Theorem|3.l| whenever 



the MLE is consistent, so is ^/ n ,a„^ up to an isometric transformation on the 



state space K . The co 



nsistency of f/ n ,a n J is gi 



iven by Theorem 



3.7 



3 Main results 



3.1 Identifiability 

We denote by 1 the set of all the isometries of K. For any functions / and h 

x 

defined on K we write / ~ h and say that / and h are in the same equivalence 
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class modulo the isometric transformations of K, if and only if there exists an 

isometry (f> on K such that / = h o <fr. In the sequel, for any random variables 
v 

X and Y, we write X = Y if X and Y have the same distribution. 

Theorem 3.1. Assume h\j§^ and Let f : K -> R e be C 1 and < b < oo. 

Assume also that h{pf^,pf ttClli ) = where pf^ andpf tia ^ are defined by (10). 
x 

Then, b = a* and f ~ /*. 



Proof. The proof of the intermediate lemmas are postponed to Section 5.1 



Let < 6 < oo and / G C 1 such that h(pj b ,p^ a J = 0. Let {^}fc>o be a 
Markov chain with initial distribution and transition kernel Qb. Consider also 
{e' k }k>o a sequence of independent N(Q,a 2 I() random variables, independent 
from {X' k } k > . Define, for any k > 0, ^ = f(X' k ) + e' k . If h(p f b ,p h J = 0, 

then, for any k > 0, (Yfc, Yfc +1 ) = (T fe ',Y^ +1 ). The density ip being known, this 
yields 

(f(X' k ), f(X' k+1 )) = (/ 4 (X fe ), /*(X fc+1 )) . (14) 



(14) and the irreducibility of the Markov chains {X k } k >o and {^}fc>o imply 
that Im(/) = Im(/*). By L^4j is a diffeomorphism. Let (J*) -1 denotes its 
inverse function and define 

= (A) -1 ° / ■ (15) 

Since /* is a diffeomorphism and / G C 1 , (/> G C 1 . The purposes of the following- 
lemmas is to prove that <fi is bijective on K and that, for any x in K, J^,(x) > 
which is showed in Lemma |375] 



Lemma 3.2. Assume L^j^ and For all x € K , J^(x) > 0, where <f> is 
defined by (15). 

Then, we show that <fi is necessarily a covering map of K (see definition 
below) and that, under rSliij), any covering map of K is a one to one function. 
These results are established in Lemma [3731 and Lemma [3~4l 

(f> : K — > K is said to be a covering map if and only if (see [T51 Chapter 11]) 

(i) <f> is continuous. 

(ii) (j) is surjective. 

(iii) For every y G K, there exists an open neighbourhood V of y and a family 
{Oi)iei of disjoint open subsets of K such that </> -1 (F) = {J ieI Oi, with 
Oi mapped homeomorphically onto V by 0, for all i G /. 



Lemma 3.3. Assume L\£^ <wid ^{^} Then, the function <j) defined by (15) is a 
covering map. 



Lemma 3.4. Assume Ly$^. Then, every covering map <f> : K —> K is a one 
to one function. 
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By Lemma 3.3 and Lemma 3.4 cj>, defined by (15) is bijective, denote by 
the inverse function of <j>. By Lemma 3.2, > on K and thus € C 1 . 
By (14), for all x G K and all positive measurable function h on K, 

QaA x > h)=E [h{X k )\X k ^ =x]=E[ho = <t>- l {x)] , 

= Q b (<l)- 1 (x),hoct>) . 

Moreover, 



K 



)b(<f> 1 (x),hocf))= I ho(p(u)Qb(<fi 1 (x),u)du 

h^Qbi^ix),^- 1 ^))]^-!^)^. 



K 



Then, by continuity, for all (x, x') € K 2 , 

This equation directly leads to b < oo. Indeed, if b — oo, the left side of the 
equation depends on x (since a* < oo) whereas the right side does not. We can 
now suppose < b < oo. By ([I]), 



a* 



C b {<t>- l {x))q 



iirV)-^) 



Therefore, for all x € K , applying ( 16 ) with x' — x yields 



I -V 1 0*0 1 



Cb{<f>- X {x)) 



\J^(x')\. (16) 



(17) 



By ri2|i|ii|, Schauder's theorem (see [53]) states that there exists xq € K such 
that <j> (xq) = xq. By (16), there exists a constant C such that, for all x € K 



|J^-i(as)| =C- 



Plugging this expression and (17) in (16) yields 



||:c-a:olh f\\x'-x\\\ (U- l {x l )-(f)- 1 {x Q )\ 



a* 



= q 



\\,t>-\x) - <t>-\x )\\\ /nrH^o-rH^ih ni^-»oi 



a* 



(18) 



Applied with x' = xq, we have, for all x £ K, 



\\x - Xq\\ 



a* 



\\cj>- 1 (x)-cj>- 1 (xo)\\ 
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and then, since q is a one to one function by assumption, 

\\x - xq\\ _ H^jx) - x \\ 
a* b 

Considering the supremum of the last inequality for x G K yields b = a*. Then, 



( 18 ) gives, for all x, x 1 e K 

\\4>- l {x')-r 1 (x)\\ = \\x i -x\\ . 

Therefore, 4> is an isometry and / = /* o <j> which concludes the proof. 



□ 



3.2 Convergence results 



Theorem 3.5 states the Hellinger consistency of the MLE v„ and ensures t hat 



3.1 



and Theorem 



3.5 



the Sobolev norm of the estimator f n is b ound ed. Theorem 
lead to the second main result, Theorem |3.7[ which guarantees that (f n ,a n ) is 
also consistent. The proof of Theorem |3 . 5 1 uses the same classical proof scheme 
as in the independent case, see Section 10.2] for an illustration of such a 
proof. This proof relies on the control of the empirical process, it requires both 
a result on the concentration of the empirical process and a maximal inequality. 
Unfortunately, the tools used in the independent case such as the Bernstein 
or the Hoeffding inequalities do not hold in our model and similar results in 
the dependent case have to be used, see pQ. Denote by P* the distribution 
of {Yk}k>o under the true parameters (/+,a*). For any sequence of random 
variables {Z n } n >Q and any sequence of positive numbers {a n } n >o, we write 
Z n = Op, (a„) if 



lim lira sup 



\{\Z n \>Ta n } = 



Theorem 3.5. Assume HmHA L 



and 1(f) by (11). Then, provided 



lj) an 

that 



d M Let (f n ,a n ) be defined by (|9| 



A„ — > and X^n 1 ^ 2 — > oo 

n— M-oc ' n— >-+oo 



(19) 



we have 

h 2 (p n ,Pf>,aJ = Pt (\ 2 n ) and / 2 (/J=Op t (l). (20) 

Sketch of proof. The proof relics on a basic inequality which controls the Hellinger 
risk h 2 (p n ,pf liai< ) and the complexity of the estimator I 2 (f n ) by the empirical 
proc ess, see . The control of this empirical process will be done in Proposi- 
tion 



3.6| We set, for any density function p on 

P ' 



!>2( 



def 1 . 

9 P = - In- 



2 Pf*,a t 



(21) 



Let P„ be the empirical distribution based on the observations ^2fc+i}fc =0 ) 
i.e., for any measurable set A of 



i.2i 



1 

n ^— ' 

k=o 
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By ([9]) and (12 1, the basic inequality of [551 Lemma 10.5], states that: 



h 2 (Pn,Pf^ a J + 4A 2 / 2 (/„) < 16 / g Pn d(P n - P*) + 4A 2 J 2 (/*) . (22) 



Therefore, a control of the term J gp n d(P n — P*) in the right hand side of ( |22[ ) 
will provide simultaneously a bound on the growth of h 2 (p n ,pf. iat< ) and I 2 (f n ). 

The empirical process indexed by W s ' p is defined, for any / £ W s,v and any 
a > a_, by 



v n{g Pf , a ) d == f Vn I g PLa d(P n - P*) , 



where <7 P/ a is defined by (21 1. Proposition 3.6 provides a deviation inequality 
for the supremum of the normalized empirical process. 

Proposition 3.6. Assume and TTiere exist some positive 

constants K , E and T such that, for any x > 0, 



gup K(fl P/ ,JI 
P(J) V 1 



> T + x } <Ke 



-T.3L 



(23) 



Proposition |3.6| is proved in Section [572] below. It ensures that 



sup 

feW s -P, a>a- 



|/ gpft .d(P n -P,)| / 

— puwi — =° rAn ] 



Plugging this bound into ( 22 | gives 

(4 + Pt (n-V2 A - 2 ))/ 2 (/ n ) < 4/ 2 (.A) + Op, (n-V^A" 2 ) 



By (19 1, this establishes the second statement of (20). Combining this result 



with (22) gives: 



h 2 (p n ,Pf^a t ) < P >~ 1/2 ) + O n (A 2 ) 



which proves the first statement of ( 20 ) and concludes the proof of Theorem 3.5 



□ 



Equations (19) and (20) give a rate of convergence of h 2 (p n ,pf tMli ). This 
rate of convergence is slower than n -1 / 2 but can be chosen as close as wanted 
to n^ 1 / 2 , e.g. we can choose A 2 = n _1/,2 lnn. 

On the othe r hand, I 2 {f n ) — Op t (l) and the Sobolev embedding described 
ensures that /„ belongs to some compact subset of C 1 with 



2.1 



in Remark 

probability converging to 1 as n tends to 00. Let dc± denotes the distance 
function on C 1 associated with the norm || • || c i. Let also J 7 * be the set of 
all the functions / in the same equivalence class as /* modulo the isometric 
transformations of K, i.e. 



= {/; / ~ /*} 



(24) 
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Theorem 3.7. Assume 1 
Then, provided that 



Let ^/„,a„^ be defined by Q and 1(f) by (11) 



A„ — > and X^n 1 ^ 2 — !• oo , 



we have, 



dci(f n -,J~+) — > and a n — > a* in P* — probability , (25) 

n— v+oo n— >+oo 



where J 7 * is defined by ( 24 ) 



Proof. We prove ( 25 ) introducing the Alexandroff compactification [a_ , oo] of 
[a_ , oo [ and a distance function on this set such that [a_ , oo] is compact and 
metric. Moreover, for any / > 0, the set B\y.p (0, /), defined as the closure in C\ 
of {/ € W s ' p ; 1(f) < I}, is a compact subset of C 1 . Thus, B w ,, P (0,I) xja oo] 



is a compact subset of C 1 x [a_, oo] and p5[ ) will result from Theorem 3.5 
continuity arguments on the function (/, a) i— > h (pf,a,Pf t ,a„)- 

By Theorem |3.5[ for any 7 > 0, there exist e > and / > such that: 



limsupP* {h 2 (p n ,p f ^ a J > e\ 2 n } < - 

n—>-\-oo * 



limsupP* \l(f n ) > A 



< 



and 

(26) 
(27) 



Denote by d the distance on C 1 x [a_,oo] defined, for all ((/, a) , (/', a')) € 
(C 1 x [a_,oo]) 2 by 

d ((/, a) , (/', a')) = d Cl (/, /') + I arctan(a) - arctan(a')| , 

with arctan(oo) = ^. The distance on [a_,oo] defined for any a and a' in 

[a_,oo] by | arctan(a) — arctan(a')| ensures its compactness. Therefore E = 
Bw s p(0, I) x [a_,oo] is a compact subset of (C 1 x [a_,oo],d). We also set 

d ((f) a ) > (J 7 *, = in f d ((/, a ) > (/'> a *)) • 

For any ?y > 0, denote by E v the following set 

S/= J E\ |J {(f,a) eC l x[a-, 00]; d ((/, a) , (/', a*)) < 77} , 

i?,, is a non-empty and closed subset of E which is compact in C\ x [a_ , 00] , 
thus E n is also a compact subset of Ci x [a_ , 00] . By the dominated convergence 
theorem, the function defined on E, by 

(f,a) ^ h 2 (p f , a ,p fti . a J 

is continuous relatively to the topology defined by the distance d on C 1 x [a_, 00]. 
The compactness of E 1 ,, implies that h 2 (pf^ a ,pf t ^) reaches its minimum on E v . 
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Let e v be this minimum. By Theorem 3.1 and since, for any / in Bw<p (0, I), 
h 2 (pf,oo,Pf*, a *) > 0, e n > 0. Moreover, 

P.M(/„,a„), (J^a*)) > r?} < ?,{/(/„) > 1} + P, {^ 2 (p„,P/„aJ > e\ 2 n } 

) < eA 2 ,d a„) , (,F*,a*)) > r?j . 



However, if J(/ n ) < 7 and d (^/„,a„J , (J 7 *, a*) J > 77, then /„ belongs to E v 
and h 2 (p n ,pf^ ai ) > e 7j . Choosing n big enough such that eA^ < e v , 

P* {/(/„) < 7, h 2 {p n , Pf ^ a J < eA 2 , d ((/„, o„) , (J",, a,)) > 77} = . 



and, by (26) and (27) 



limsupP* jd , (J 7 *, a*)^ > 77 j < 7 . 

Since 7 can be chosen arbitrarily small, for any r\ > 0, 

limsupP* < dc 1 (/„, J 7 *) + I arctan(d„) — arctan(a+)| > 77 > = , 

and lim,n. 00 dcj (/„, J 7 *) = in probability. Moreover, the function tan be- 
ing continuous on [0, ^[ and since arctan(a*) ^ limn^oo \a n — a*| = in 
probability. 

□ 



4 Numerical experiments 

In this section, we suppose the parameter a* to be known and illustrate the 
performance of the estimator f n defined by For practical considerations, 
we choose v = 1 in ( 11 1 and p = 2. The theoretical results provided in Section [3] 



rely on the assumption that v > 21. However, choosing v = 1 allows to define 
an algorithm easy to implement with good convergence behavior. Using v > 1 
would imply more involved numerical procedures to obtain parameter estimates. 
Let n be a positive integer, in this section, we denote by / the estimator defined 
by ^ that maximizes the function T defined by 



T : W s ' 2 

f ^ lYJl=l^Pf,aSY2k,Y 2k + 1 )-\l\\f\\ 2 WB 



The HMM framework suggests to use an Expectation-Maximization (EM) type 
procedure, see |S]. This algorithm iteratively produces a sequence of estimates 
{f p }p>o- Assume the current parameter estimate is given by f p . The estimate 
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fp+i - s j e fi ne (j as one f the maximizer of the function Q defined by 



1 n— 1 

/ ^ Q(f, P) = - K P [ ln Pf-** (^2fe, ^2fe, X 2k+1 ,Y 2k ) \Y 2k , Y 2k+1 ] 



k=0 



where Ej p [•] denotes the expectation under the law of the stationary HMM 
parameterized by f p and where 

PfM„ {x, y, x', y') = u at (x)q at (x, x')ip(y - f(x))ip{y - f(x')) . 
The differential of / Q(f, f p ) is given, for any /, h E W s < 2 , by 
d f Q(-J p )(h)=S n , 1 (f p ,f,h) + S n , 2 (f p ,f,h)-2\ 2 n ]T (D a f,D a h) L2 , 

0<\a\<s 

where 

n-1 



SnAF, f, h) = — V E fp [(h(X 2k ),f(X 2k ) - Y 2k )\Y 2k .. 2k+ i] , 
S n ,2(f P J,h) = -^Y, E f? l(h(X 2k+1 ),f(X 2k+1 ) - Y 2k+1 )\Y 2k .. 2k+1 ] . 



k=0 

f p+1 is then defined as the function / E W s - 2 such that for any h E W s ' 2 , 
dfQ(f p , -)(h) = 0. In the sequel, we choose s = 2 and K = [0, 1], therefore, this 
implies, for any h E W 2 > 2 ([0, 1], R), 

2 

S n , 1 (f p ,f,h) + S n , 2 (f p ,f,h)-2\ 2 n =0. (28) 

This equation can be applied to any function h in Wq' 2 d = {h E W([0, 1], R); /i(0) 
/i(l) = 0}. Using integration by parts, this yields, for any component fj and 
any x E [0,1], 

n-1 



V ™ fe=o / 

-/f \x)+f^\x) = £ {^ fc ^| 2 a fc:2fc+1 (x) +Y 2k +i<p f 2k '+ ll2k:2k+1 (x)} 



fe=0 

(29) 
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Jp a ^ Tp a ^ 

where 2 fc|2fc-2fe+i and ^2fc+i|2fc-2fc+i are ^ ne filtering distributions defined by 

J P ,a* ( \ 4££ J ^(^^(z^'M^fe - P (x))tfi(Y 2 k+l - P{x'))Ax' 

Pfp,aS 2k+1 > 

,f p ,a* , f\ dcf J VaA X )<laA X ' X ') ( P( Y 2k - P ' (x))ip{Y 2k +\ - f p (x'))dx 

( f ) 2k+l\2k:2k+l\ X > — f V y \ 



4.1 Numerical approximations 



Let TV > 1 be an integer. The differential system ( 29 1 is solved using a discretiza- 



tion of the state space [0, 1] by {jf}^L - The filtering distributions ^2fe|2fe-2fe+i 
and 02fc+i|2fc-2/c+i are approximated by piecewise constant functions (// ' a * and 

— f P -a* 

4> k , defined by 

JV-l _ _ N-l 

i=0 i=0 

where, for any i G {0, . . . , N — 1}, ip^ . (resp. is the approximation of 

<j) J ,a * (jf) (resp. </> fc ' (]^)) obtained with an Euler scheme. The equation 



(29) is solved on each interval [jj, ^[, i € {0, • • • , N — 1}, which is straight- 
forward since the coefficients are constant and the equation is linear. For any 
i € {0, ...,N— 1} and any j 6 {0, the solution fj t i on the interval 

[jj , ^ [ belongs to some affine space of dimension 4. Thus, AN parameters have 
to be chosen to uniquely determine the solution = X^o* ^-[^ i+1 [/j>' - ^ ne 
C 3 -regularity conditions for each boundary provides 4(iV— 1) equations and solv- 



ing (28 1 with h{x) = 1, h(x) = x, h(x) = x 2 and h(x) = x 3 leads to four other 
linear equations which conclude the computation of fj + - The procedure is 
displayed in Algorithm [l] The numerical approximations and the computations 
of all the constants are detailed in the supplement paper Section 3] 

4.2 Experimental results 

The Algorithm [I] is applied with the Gaussian kernel (a* = 1): 

VieR, q(x) = exp j - ^' 2 

The aim is first to estimate the function (in this case £ — 3) 

/* : [0,1] R 3 

x i— > (3x,30(x- 1/4) (a;- l/2)(x - 3/4), 2 cos(5x)) , 
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Algorithm 1 One iteration of the algorithm 



Require: N, f p , a*, Y 0:2n -i- 
Ensure: 

for i G {0, • • • , N} do 
for k G {0, ■ ■ ■ , n - 1} do 

Compute ip^ . and (p^ k . 
end for 
end for 

for j £ {1, • • ■ do 

for i G {0, • • • , N - 1} do 

Compute fjj by solving (29). 
end for 

Set ^ 1 =S£ 1 1 I ^ I /W 
end for 



We use ct 2 = 1 and AT = 50 to sample observations from the discretized model. 
The estimation is started with the estimate 

f° : [0,1] R 3 

x H> (x, 0,0) . 

The Algorithm [l] is run with X 2 n = [c\n(n)/y/n\. Figure [l] displays the estimate 
after 1, 2, 3 and 25 iterations with n — 50000 observations along with the true 
functions for each coordinate. Figure [T] shows that after few iterations of the 
algorithm, the estimate can recover the curvature of the function even with 
a flat initial estimate. 
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(a) fx and its estimates. 



(b) /2 and its estimates. 



3 




(c) /3 and its estimates. 



Figure 1: Estimation of /i, /2 and /3 after 25 iterations of the algorithm. The 
true function (bold line) and the initial estimate (dots) are displayed along 
with the estimates after 1 (squares), 2 (diamonds), 3 (crosses) and 25 (stars) 
iterations. 



Figure [2] gives the evolution of the error as a function of the number of 
observations. We consider the L 2 -error and the Loo-error respectively defined, 
for h x ,h 2 - [0,1] -> K, by 



N 



\hi - h 2 \\oo d = sup 

Ki<N 



1/2 



For each number of observations 50 independent Monte Carlo runs are used to 
compute the L 2 -error after 25 iterations of the algorithm. Figure [2] shows the 
median and the lower and upper quartiles over the 50 independent Monte Carlo 
runs. 
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Number of observations x10 4 Number of observations x 10 4 

(e) L2-error for fa (f) Loo-error for f-j 

Figure 2: L 2 (left) and (right) errors for each coordinate. The median (bold 
line), .25 and .75 quantiles (dotted lines) and .05 and .95 quantiles (balls) over 
50 independent Monte Carlo runs are represented. 



5 Proofs 

5.1 Identifiability 

Lemma (Lem ma|3.2[ ). Assume I^ty and ij^} For all x e K, J^,(x) > 0, where 
4> is defined by (15). 



Proof. By (15), (14) becomes 

(^),0(XO) = (Xo,*i). 
We now give an expression of the density of these two random vectors on K x K. 
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Let ft, be a bounded measurable function of K x K . We have 

E[h(t(X' ),(f>(X' 1 ))} = J h(^{x ),(j}(x 1 )y b (xo)qb(xo,x 1 )dxodx 1 . (30) 

We introduce the set 

A d = {zeK;yxeK s.t. <p(x) = z, J^x) > 0} . 
Let assume h is of the form 

h(x ,xi) d = h 2 (x ,xi)lA(xo)lA(xi) , (31) 
where h 2 is any bounded measurable function. We have 



E[h(<l>(X' ),(t>(X' 1 ))]= I h 2 (cb(x ),cb(x 1 )) n (x )q b (x ,x 1 ) 

x l A ((f>(x Q ))l A ((f)(xi))dxodxi 

ft a (0(»o),^i)) ^ ( ? ) ^ g ? ,g ( ) lA(0(»Q))lAWa; 1 )) 
^(a;o)J0(^i) 

x J^(a;o)^(^i)da;oda;i . 

By [13l Theorem 2, p. 99] and the area formula, for almost every z £ K, <f> ({z}) 
is at most countable and we can apply the change of variable Zq = <^(xo), 
z\ = 4>{xi). 

E[h(<f>(X' ),<f>{X[))} = [ hz{zo,zi)lA(zo)lA{zi) 



v b {x )q b {x ,xi) 

.=rfr ^ j <i>K x q) j ^ x v 

x e<t> ({zq}) 
x 1 e<f>~ 1 ({z 1 }) 

Moreover, 

E[/i(X ,^i)] = y ft2(^o^i)^a J ,(zo)9a ( ,(2o, 2i)lA(zo)lyi(zi)dzodz 1 . 

Therefore, for almost any (zq, Z\) € K x K, 

i \ i m i m i \ -if m c \ ^(x )g b (a:o,xi) 

^a,(^o)9a, (Z0,Zl)lMz )lM z l) = 1a(z )1a(zi) > — — — — r~ 

x £4> L ({z }) 

By Sard Theorem (see [5]), since <j) is C 1 , 

fj, ({z 6 K ; 3i £ i^, <^(x) = z and J^(.t) = 0}) = , 
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Therefore, the function z i— > 1a(z) equals 1 almost everywhere in if. Finally, 
for almost any (zq, z±) € if x if, 



, . , , Vb{xa)q b {xQ,xi) 
I^,< „ J 4>\Xo)J<p[.xi) 

XQ&4> ({ZO}) 

Let us assume that there exists xq £ if such that J^xq) = 0. There also 
exists x £ K such that J^,(a;) > (otherwise /ti(if) = [i(cf)(J^ 1 ({0}))) = 
by Sard Theorem). By the mean value theorem, for all large enough k € N*, 
there exists Xk € if such that Jcf>(xk) = r- By the inverse function theorem, 
there exists neighbourhood of in if such that <^|j/ fc is a diffcomorphism. 
Jtf, being continuous, there also exits a neighbourhood Vk such that, for all 
x G Vfc, - J^(a;fc)| < ^. Therefore, for all x in 14, 

A > Mx ) > J,( Xk ) - 1 = 1 . 

Let Wfc = L/fenVfe, 0|iy fc is a diffeomorphism and /u(0(Wfe)) > 0. Therefore, there 



exists (zk,o,Zk,i) € 0(Wfe) x 0(W&) such that (32 1 is true. We denote by Xk,o 
and Xk i the unique elements of Wk such that Zj^q = <p(xk t o) and = 0(xfc,i). 
Then, ' 



VaS z kfi)qaA z k$, Z k ,l) 



^b(a;o)gfc(xo,a;i) 
^ J^>(x )J t f,(xi) 



X0&4> 1 ({zk,o}) 

xie<t>~ 1 ({z k , 1 }) 

. ^b(^fe,o)9b(^fc,o,a;fc,i) ^ 2fc / s i \ 

> — - T^R,o)9iW^M ■ 

J<j>{Xk,o)J<f,{Xk,l) o 

By I^2j|i|, (xo, cci) H> v c (xo)q c (xo, x\) is bounded for any < c < oo: there exists 
< C~ < C+ such that, for any (x ,xi) € if 2 , < C~ < v c (xo)q c (xo, x±) < 
C+, we have, for any k > 1 large enough, 

2fc 

which is absurd and concludes the proof. □ 



Lemma (Lemma 3.3 1. Assume HSwy and HA Then, the function <f> defined by 



( 15 ) is a covering map. 



Proof. <J±J> comes from the continuity of (/*) 1 and / and (JTTJ) is true since 



Im(/*) = Im(/). For (pji]), let z G if and assume the set tfi 1 ({z}) = f {x € 



if; = z} is infinite. By Lemma 3.2 <f> is of full rank, then, by the inverse 
function theorem, for each x G <f> ({z}) , there exists an open neighborhood 
V x of x such that the function <f> : V x — > <^{V X ) is a diffeomorphism and such 
that the {V x } xe< f,-i^ z ^ are pairwise disjoint. By t|2j|i|, there exists n € N 
such that Ux£<^- 1 ({z}) ^ can ^ e covered with only n subsets of the form V Xi , 
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Xi E (f> 1 ({z}), i £ {l,...,n}. Therefore, for any x £ <p X ({ z })> there exists 
i € {l,...n} such that x £ V Xi . If x £ {xi}f =1 , then x £ V x fl which is 

absurd since and V Xi are disjoint. Therefore, </> -1 ({2:}) = f {a; € if; <j)(x) = z} 
is finite and denoted by {xi}™ =1 , n £ N* . Let {VJ}™ =1 be disjoint open subsets 

of K such that x t £ V t and define V d = C\^ =1 (j){V t ) , O, = 0j^(V). Then, V is 
an open neighborhood of z which concludes the proof. □ 



Lemma (Lemma 3.4). Assume i^jjii]). Then, every covering map <fi : K — > K 



is a one to one function. 

Proof. Assume there exist x\ and X2 in K such that x\ ^ xi and 4>{x\) — 
4>{ x 2) — V- By tj2}[n|, K is path-connected and there exists a continuous path 
7 : [0, 1] — > K such that 7(0) = x\ and 7(1) = x%. Then o 7 is a continuous 
path taking values in K such that ^07(0) = ^07(1) = y. If 7 denotes the path 
defined by, for all i G [0, 1], 7(i) = then </> o 7 and 7 are two paths in K with 
the same initial and terminal values. By ri2pD), K is simply connected and (^07 
and 7 are path homotopic (see [TH1 p. 151]). The function u : [0, 1] — > K such 
that, for all t £ [0, 1], u(t) = x\ is a lift (see [TH p. 237]) of 7 for the covering 
map <p. Moreover, 7 is a lift of </> o 7 for the covering map </>. By the homotopy 
lifting property (see [TBI Proposition 11.11, p. 238]), since u(0) = 7(0) = X\, 
then u and 7 are path homotopic and have the same extremity: x\ =x<i- This 
is absurd. □ 



5.2 Proof of Proposition 3.6 



Proposition |3.6| provides a deviation inequality on the empirical process renor- 
malized by I 2 (f). First of all, for any M > 1 the Sobolcv ball of radius M 

centred in is denoted by W^f. Define the following collections of functions on 

ra>2£. 



tier 



a > a 



-} 



and Q M = {g 



E*[<?(Fo,^)]; 9£Gm} 



where E* is the expectation under the distribution P*. 

The first step of the proof establishes a deviation inequality on the empirical 
process restricted to the Sobolev balls W^f, sup seeAf |f„(g)|. The dependency 
in M of this inequality allows the determination of a lower bound on v in the 
penalty (11) sufficient to establish Proposition 3.6 The second step and con- 



clusion of the proof consists in using the peeling device with the decomposition: 



W'<> = Wl' p U |J {W°? +1 \ W*?} 



fc>0 



in order to apply the deviation inequalities on sup geg 
{W^ +1 \W^}. Proposition" 
strictcd empirical processes. 



5.1 



v n (g)\ , to each band 
gives a concentration inequality on the re- 
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Proposition 5.1. Assume tfflj^^, £f5| and There exist some 

positive constants K\,K2,C and c, depending on /* and a* such that, for any 
M > 1, any n > 1 and any t > Cn~ x / 2 , 



?*\ sup \v n {g)\ > cE, 



sup \v n {g)\ 



Mt \ <K X [e- K2t2 +e~ K2t 



(33) 



The proof of Proposition 5.1 is given in the supplement paper [12l Sec- 
tion 2] and relies on the concentration results of pQ. It remains to control 
E * [ su P 3 eS A/ Wn(g)\] for any M > 1. 

Proposition 5.2. Assume 7^7} 7Q[i|(|iii]),^ TQi} and 7^ There exists a 
positive constant K depending on v, such that, for any M > 1, 



E* 



sup \v n (g)\ 



< KM V+1 



(34) 



The proof of Propositio n|5.2| is given in Appendix[B] We now combine Propo- 
sition 5.1 and P rop osi t ion 1 5 . 2 1 to obtain a deviation inequality on the empirical 
process restricted to the truncated collection of functions Gm- Let n > 0. There 
exist K\, K% and K$ such that for any M > 1, any n > 1 and any t > -?=., 



\ ( sup \v n {g)\ > K 3 M V+1 + Mil < K x (e^ 2 * 2 
IgeQM J v 



-K 2 t 



(35) 



Proposition 3.6 is obtained applying the peeling device as in [25l Lemma 5.14]. 
Let {xk}keN* be some chosen weights such that, 

e~ Xfe < +oo and, for any k > 1, C V 1 < x k < 2 kv . 



k>l 



Let k > 0, for any positive x, if i = ir + Xfc, for any n > 1, we have t > C > 

Since i > x fe > 1 and x fc < 2 to , we have e"^ 2 ' 2 < e"^ 2 ' and t < 2 kv (x + 1). 
Plugging these relations into ( 35 1 leads to 



sup 

jew*?, a>a_ 



5p/ a d(P„ - P*) 



ofc(u+l) 



where K[ = 2^ and ^ = if 3 + 1. If T = 2 t, + 1 K 3 , 

|/ap /|S d(P n -P0| > T + 



< 



sup 

feW s 'P, a>a_ 



sup 



!*(/) V 1 
f <? P/ , a d(P„-P*) 



x 



> 



In 

T + x 



In 



sup 



fc=0 /ew a r + i, «>« 



/" gp /ia d(P„ - P*) 



u+ll ^ + X 
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However, since T > K' 3 and X\ > 0, by (36 1 applied with fc = 0, 



sup 



5p/a d(P„-P*) 



> ^ 1 < Me-*" 
\/n \ 



Therefore, by the definition of T and by ( 36 1 , 



sup 



l/g P /,qd(P n -P*)| T + z 
J 2 (/)V1 - 



<^e- K2a; + VpJ sup 

fe=0 1/6WS5.1. »>»- 



> (2 fc+i ) 

OO 



ffp/ a d(P„ - P*) 



k+i\v+i K'a + x/2 v+ 



k=0 



This last equation ensures the existence of some positive constants K and £ 
such that 



sup 



\Jg Pfa d(P n -V*)\ T + x 
/ 2 (/)Vl " ^ 



For the sake of simplicity, for any / £ W s ' p and x = (xo, Xi) € K x K, we set, 
for any a > 0, 

/(x) d = (/(i ),/(n))eM a and i/ a (x) d = ^(z )<Za(zo, *i) . (37) 

This appendix is devoted to the proof of an intermediate lemma on the 
envelope functions of the sets Gm and Gm defined, for any y € M. 2e , by 

GAf(y) = f sup g(y) and G M (y) = f sup g(y) . 

9&Sm geg M 

Lemma A.l. Assume -ff^fli]), ipHm]), an d There exists a constant 

Cq > such that, for any y G K , 

GM(y)<C G (l + M||y||) . 

Proof. For any y € M. 2e , any / € W^f and any a > a_, 



2 2 2 V p/*,o*(y 



y) 

1, 11, A ^(x)exp(-||/(x)-y|| 2 /2a 2 ) ' 

" 2 n 2 + 2 n I + x & e U P 2 ^(x)exp(-||r(x)-y|| 2 /2 ( T 2 ) 
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By tj2j[i|, ([3]), Q and (37), there exists a constant c„ > 1 such that 

sup -— < c v . 



Therefore, 



1, f u exp(-||f(x)-y|| 2 /2a 2 ) ^ 

5p/ a y) = o 111 O + O 111 1 + ° v SU P 1 \\t*t \ H2/0 2\ 

2 2 2 V xe/f 2exp(-||/*(x) -y|| 2 /2cr 2 ) 



By EJ3| and E|4||i| /* is bounded and there exists a constant c such that 

exp(-||/(x)-y|| 2 /2a 2 ) 

exp(-||/*(x)-y||V2^) " ^ Wl + ll/(x) " ' " yll)) ' 
Then, there exists a constant c such that 

5 P/ >(y)<c(l + ||/(x)||.||y||), 

and the proof is concluded by ([8|. □ 

Lemma [A.l| implies that there exists a constant C > such that, for any 

G M (y) <C(1 + M||y||) . (38) 



y e K 2£ 



B 



We prove Proposition 5.2 using entropy with bracketing arguments on the class 



of functions Qm ■ Define the class of function 

Vm = {p/,„: feW^, a>a_} . 

Let || • || be a norm on Q, the entropy with bracketing for the norm || ■ || is defined 
as follows: 

Definition B.l. Let Q be some class of functions. For any positive 8, let 
(8, Q, || -||) be the smallest N such that there exist a set of brackets { \gf , gf] } _i 
for which \\gf — gf\\ < 8 for all i € {1, • • • , N}, and for any g in Q , there exist 
i G {1, • • • , N} such that 

gf<9< g? ■ 

Nn(8,Q, || • ||) is called the 8-number with bracketing of Q , and Hu{8,Q, || • ||) = 
lniVn(<5, Q, || • ||) is the 5-entropy with bracketing ofQ. 

Let Y = {Yfc} fegZ be the observations process defined, for all k € Z, by 

def 

Yfc = (3"2fc, Y^k+i)- A way of measuring the dependency of the process Y is 
the determination of its /3-mixing coefficients defined, for any n > 1, 

/3„ d =su P sup \P*(A\UZ) -K(A)\ , (39) 
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where H% = a (Y fe , k < u) and Q^ +n = a (Y fc , k>u + n). Let {/3„}„>i be 
defined by (39), then, by combining [331 Chapter 9] and the results on the 
control of the ergodicity of Markov chains by coupling techniques of [S] , it can 
be proved that there exist (3 in (0, 1) and C > such that, for any n > 1, 



(40) 



Define the mixing rate function /?(•), by /3(t) = if t > 1 and /3(£) = 1 
otherwise. For any numerical function g, we denote by Q g the quantile function 
of |<7(Yo)| and define the norm ||g||2,^ as in jTUj by 



\9\\2,p 



del' 



/3- 1 («)[Q 9 (w)^] : 



1/2 



where /3 1 denotes the cadlag inverse of the function (3(-). We also denote by 
^2.p{^*) the class of numerical functions g such that ||<?|j2.^ < oo. 

Proposition B.2. Assume tf^, i^Um]) and For any p' > 1, s' > 2£/p', 
any integer r > 1 and any even number b such that b > s' + 2£(1 — 1/p'), there 
exists a positive constant C such that: 



Ve > 0, M > 1, Hn(e,g. 



\2,P. 



< c 



M 



s'+b+^l 



21/ V 



The proof of Proposition |B.2| is given in Appendix [C] Proposition |B.2| al- 
lows to apply [TU1 Theorem 3] to the class of functions Qm- Let B be the 

function defined on R + by B(x) = /3~ 1 (i)dt and, for any e > 0, 5m{^) = 
sup Qg m (0 V The following lemma is an application of [10l Lemma 2] it 

t<e 

allows to bound the || • ||2,,3-norm by || ■ ||l q ,. for all r > 1. For any g in C2,p and 
any r > 1, 



\9\\2,p 



< 




(41) 



Moreover, by |20[ Lemma 7.26], for any natural number r > 1 there exist a 
positive constant C such that for any / in W s,p and a > 0, 



IISp/.JIWP*) ^ Ch (Pf,a,Pf*,aJ 



(42) 



The Hellinger distance being bounded, (|42j and (41) state the existence of a 
positive number d such that \\g Pf a \\2j3 < d for all / in W s ' p and a > a_. Define 

for any M > 1, (fM = f J Q d \/H[] (u, Qm, \\ ■ \\2,p)du. Thus, by [TUl Theorem 3], 
provided that <5m(c) — > 0, there exists a constant C such that 



sup \v n {g)\ 



< Cv? M 1 + 



S M (1 A £„,m) 



(43) 
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where e n? M is the unique solution on M + of the equation: 



2 9 

x _ Vm 



B(x) nd 2 



In the sequel, we control the quantities appearing in (43). By Proposition B.2 



and the definition of <fMi f° r anv p' > 1, s' > 2£/p', r > 1 and any even number 
b such that b > s' + 2£(1 — 1/V), there exists a constant C depending onp',s',r 
and b, such that 

VM < C (M s ' +b+ V l ) l ' S ' j\- 2rl / s 'du , (44) 

with Jq u~ 2rt l s du < cxi whenever s' > 2r£. If b is the unique even number such 
that s' + 2£(l - l/p') <b< |Y + 2£(1 - l/p')~\ + 1 and if s' tends to infinity in 



(|44|), then ( M s ' +h+ V l \ — ► M 2t and it follows that, 



s'— >oo 



Vn > 0, 3C> 0, VM > 1, cp M < CM 2e+n . 

By t(5j there exists a constant C such that 

</>M < CM V . (45) 

Lemma B.3. Assume and There exists C > 0, such that, for any 

M > 1 and any £ G (0, 1), 



Proof. Set eo = (eo, ei), set -a > Cg, where Cg is defined in Lemma A.l 



\{G M (Y ) >u} < P*{C G (1 + M||Y ||) >u} 

u/C G - 1 ' 



< P*<!IIYo||> 



A I 



- --.<!iir(Xo)ii + ii6oii> u/c ^. 1 

< P.<!|M> U ^ 1 -c 

where ||/*(x)|| < Cqo for all x in K 2 (/* is bounded by and Using 
Cirelson-Ibragimov-Sudakov inequality, see Section 1.2.1], for any x > 

pJi(||eo||-E(||eo||))>»]<e-^. 



2G 



Hence, 



\{G M (Y Q )>u} < exp 



u/Cg-l 
M 



-E( 



eo| 



2o- 2 



exp 



2a 2 



where C\ = and c 2 =€00+ E* [||e ||]. Setting 1 > t > 0, let u be such that 
t = ¥,{G M (Y G )>u} 1 then, 

\ 2 \ 



exp 



2(T 2 



implies 



u < — I Mc 2 + 1 + M ( 2a 2 In 



> i 



1/2N 



which concludes the proof. 



By ( 40 ) , there exists a constant C > such that 
Vx G (0,1), B(a:) < Cx ( 1 + ln 



□ 



(46) 



Lemma B.4. .Assume ^Q|i]) fflfic! There exists C > sttc/i i/iai /or ant/ 
< e < 1, 



8 M {e)<CMUl 2 hx{^\ 
Proof. By Lemma |B.3| 



e<e- 



< CM 



Qa M (t) < CM ( 1 + ln 1 / 2 



Therefore, by (46) 



Q GM (t)v^) < CM f 1 + In 1 / 2 f 1) V / Jl + In ( \ 



For t > e-\ ln^- 1 ) < 1 and Qa M (t)y/B(f) < CM. 
For t < e _1 , ln^ 1 ) > 1, this yields, for t < e -1 , 

The proof is concluded upon noting that the function £ H> i^lnfi -1 ) 
reaches is maximum at e -2 . □ 



Finally, Lemma 



tion 



B.4 



ensures that 8m (e) — ► for any M > 1, and Proposi- 



5.2 results from (43 1, Lemma B.4 and (45). 
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c 

The aim of this appendix is to prove Proposition B.2 The computation of 
H[}{c,Gm, II • \\2.f}) is not an easy task as the dependency of 1 1 <? | ] 2 ,/3 in g only 
appears trough the quantile function Q g . Moreover, the dependency in M of 
the entropy Hq(e, Qm-> II ' II 2, p) is not straightforward. The next lemma allows 
to control the bracketing entropy of Qm relatively to the || • ||2,/3-norm by the 
entropy of Vm relatively to the || • ll^ma^-norm . 

Lemma C.l. For any integer r > 1, there exists a constant C such that: 
H {] (e,G M , || • IM < CH {] (e 2r ,V Ml II • lkx(R")) ■ 

Proof. The function In being increasing, if [Pxj,Pl] is a bracket for Vm, then 
[gPu,gp L ] is a bracket for Q M . Moreover, by [2TTI Lemma 7.26], there exists a 
positive constant C such that 

\\gpu ~ gpJljr.) < cw^Pu-^ 2 



•u - yPL II L 2r (P,) ^ ^\\V r U - V - r £|lL 2 (R2*) 

2 



Moreover it is straightforward that \\VPu — v^l|Il 2 (r2*) < \\Pu — -P-lIIl^r 2 *)- 
The proof is concluded using (41 1. □ 

[2"2"] provides results on the entropy rates for function classes of Besov or 
Sobolev-type. Therefore, to control the entropy rate of Vm we prove that it 
is included in some weighted Sobolev Space. Define the polynomial weighting 
function (y) b =' (l + |y|| 2 ) 6//2 parametrized by b £ K where y £ R 2e . Further- 
more, define for p' > 1, and s' > 2£/p' the weighted Sobolev space 

W s '' p ' (R 2t , (y) b ) d = {/ : / • (y) h £ W s '' p ' (R 2e , R) | . 

Lemma C.2. Assume f^j^, jp||iii| and For any p' > 1, s' > 2£/p' and 
any even and positive number b, there exists a positive constant C such that 

V/er', Va>a_, \\p f , a ■ (y) b \\ w ^ P ' {R 2^ R) < C (IV \\f\\ w ^) s ' +b+ V e . 
Proof. Let / be a function in W s,p , for any a > a_, 

Wpu ■ (y) b nt', P ' (R2 ,, R) = E W° a (p/.. • (y) b ) lit ■ 

\a\<s> 

Applying the general Leibniz rule component by component, for any a £ N 2e , 
D a (p/,„ • <y) b ) = E (")D a \(y) b )D a - a \ PLa ) , (47) 



where(") = l\f=i ("0- Thus ' Lemma C. 2 results from the control of ||D Q<1) ((y) b )L> a<2) (p/, a )|| L , 
for any given o^ 1 ) and in N 2e . It is straightforward that, for any a in N 2 ^, 
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there exists a polynomial function P a whose degree does not exceed \a\ such 
that, for any y £ R 2e , 

rrpfM = J xeK2 - y) ex P {- ^^ l^ ■ ( 48 ) 

Moreover, since b is an even number, that for any a £ R 2e such that \a\ < 
b, D a (y) b is a polynomial function denoted by Pb, a whose degree does not 
exceed b— \a\. In the case where \a\ > b, D a (y) b = 0. Since P a (2) and P& a (i) 
are both polynomial functions, and since ^ ensures that, for any x in K 2 , 
l!/( x )|| < V^kII/IIvk 8 ." < \/2k;(1 V ||/||vy s .3»), there exist a constant C depending 
on cr 1 ), cr 2 ' and b such that, for any y in R 2e and any x in K 2 , 



|A, Q a)(y)iW/(x)-y)| 

< C(l + ||y||) 6 - |atl)| l| ci)|< 6 x (V2k (1 V + ||y||) 

it 



|a (2) l 



Define the following subset of 

A f d ^ [y € R 2e ; \\y\\ < V2k(1 V \\f\\ w .„)} . 

||/(x)— y|| can be lower bounded by when y belongs to Af and by |v2k(1 V 
when y belongs to Af. Therefore, uniformly in x G K 2 , 

exp {- I|/(X ^ y| ' 2 } < l Af (y) + lA .<y )e -±Wmnw~)-M)' . 

Thus, there exists a constant C > 0, independent from a, such that, for any 
y in R 2e , 

D ail \(y) b )D« i2 \ PLa )(y) 



< c(i v ii/n^r <2) • (i + iiyii) fc -'" <1) ' fi + -= - ML ) 

V V2k(1V / \\w'<p)/ 



l« (2) l 



l j4/ (y) + l A e( y ) 



e 2 



(1 V || ; \\ w »>p) 

l 2 -(v / 2«:(lv||/||„- s , P )-||y||) 2 



Therefore, for any p' > 1, 



where, 



/a 



def 



del' 



(i + l|y||) p ' (hHQ(1)|) 



llyll 



V2K(lV\\f\\ w ., P ) 



p>< 2 »| 



dy , 



h = 



(1 + ||y||) p ' (h - |Q<1,|) ( 1 + M ^ 

( m> \ V2k(1 V\\f\\ w .,,)J 



p'|a< 2 '| 



(1 V ||/||vt— ). 

x e -^r(^«(iv||/|| w ..*)-||i/||) !, dy 
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-y in I\ and I?, and noting 



By applying the change of variable y' = ^n^wjK w 3>p n . 

P 'i/2«(lv||/ || w b, p ) r \\.'\\\ 2 vgiV A II ./111 2 

that e 2^ v i J i) < e 2^ I 11 * U , there exists a constant C 

such that 



|p aW (<y) 6 )^ C2, ( Pj , )||^<C(lV||/|| ws , P r'(l« (3 'l-l« tl) l+ fe )+^ 



(49) 



Using (491 in (47) with cr 1 ) = a 1 and a^ 2 ' = a — a 1 for any |a| < s' and a' < a 
concludes the proof. □ 



Hence Lemma C.2 ensures that, for any p' > 1, s' > 2£/p' any even integer b, 



the renormalized classes of functions Vm /M s +6+ p' <; j M > 1 belong to the same 
bounded subspace of W s ' ' p ' (R 2e , (y) b ). By [H Corollary 4], for anyp' > 1, and 
any s' > 2£/p', provided that b > s' + 2£(1 — ^7), there exists a constant C such 
that 

Ve > 0, H {] Uv M /M s ' +h+ 7\ || • < Ce- 2t ' s ' . (50) 



Lemma C.l and (50) conclude the proof of Proposition B.2 
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